135 research outputs found

    Dynamic, Task-Related and Demand-Driven Scene Representation

    Get PDF
    Humans selectively process and store details about the vicinity based on their knowledge about the scene, the world and their current task. In doing so, only those pieces of information are extracted from the visual scene that is required for solving a given task. In this paper, we present a flexible system architecture along with a control mechanism that allows for a task-dependent representation of a visual scene. Contrary to existing approaches, our system is able to acquire information selectively according to the demands of the given task and based on the system’s knowledge. The proposed control mechanism decides which properties need to be extracted and how the independent processing modules should be combined, based on the knowledge stored in the system’s long-term memory. Additionally, it ensures that algorithmic dependencies between processing modules are resolved automatically, utilizing procedural knowledge which is also stored in the long-term memory. By evaluating a proof-of-concept implementation on a real-world table scene, we show that, while solving the given task, the amount of data processed and stored by the system is considerably lower compared to processing regimes used in state-of-the-art systems. Furthermore, our system only acquires and stores the minimal set of information that is relevant for solving the given task

    Determinants of Dwell Time in Visual Search: Similarity or Perceptual Difficulty?

    Get PDF
    The present study examined the factors that determine the dwell times in a visual search task, that is, the duration the gaze remains fixated on an object. It has been suggested that an item’s similarity to the search target should be an important determiner of dwell times, because dwell times are taken to reflect the time needed to reject the item as a distractor, and such discriminations are supposed to be harder the more similar an item is to the search target. In line with this similarity view, a previous study shows that, in search for a target ring of thin line-width, dwell times on thin linewidth Landolt C’s distractors were longer than dwell times on Landolt C’s with thick or medium linewidth. However, dwell times may have been longer on thin Landolt C’s because the thin line-width made it harder to detect whether the stimuli had a gap or not. Thus, it is an open question whether dwell times on thin line-width distractors were longer because they were similar to the target or because the perceptual decision was more difficult. The present study de-coupled similarity from perceptual difficulty, by measuring dwell times on thin, medium and thick line-width distractors when the target had thin, medium or thick line-width. The results showed that dwell times were longer on target-similar than target-dissimilar stimuli across all target conditions and regardless of the line-width. It is concluded that prior findings of longer dwell times on thin linewidth-distractors can clearly be attributed to target similarity. As will be discussed towards the end, the finding of similarity effects on dwell times has important implications for current theories of visual search and eye movement control

    Visual saliency and semantic incongruency influence eye movements when inspecting pictures

    Get PDF
    Models of low-level saliency predict that when we first look at a photograph our first few eye movements should be made towards visually conspicuous objects. Two experiments investigated this prediction by recording eye fixations while viewers inspected pictures of room interiors that contained objects with known saliency characteristics. Highly salient objects did attract fixations earlier than less conspicuous objects, but only in a task requiring general encoding of the whole picture. When participants were required to detect the presence of a small target, then the visual saliency of nontarget objects did not influence fixations. These results support modifications of the model that take the cognitive override of saliency into account by allowing task demands to reduce the saliency weights of task-irrelevant objects. The pictures sometimes contained incongruent objects that were taken from other rooms. These objects were used to test the hypothesis that previous reports of the early fixation of congruent objects have not been consistent because the effect depends upon the visual conspicuity of the incongruent object. There was an effect of incongruency in both experiments, with earlier fixation of objects that violated the gist of the scene, but the effect was only apparent for inconspicuous objects, which argues against the hypothesis

    Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues

    Get PDF
    In this contribution, we present a large-scale hierarchical system for object detection fusing bottom-up (signal-driven) processing results with top-down (model or task-driven) attentional modulation. Specifically, we focus on the question of how the autonomous learning of invariant models can be embedded into a performing system and how such models can be used to define object-specific attentional modulation signals. Our system implements bi-directional data flow in a processing hierarchy. The bottom-up data flow proceeds from a preprocessing level to the hypothesis level where object hypotheses created by exhaustive object detection algorithms are represented in a roughly retinotopic way. A competitive selection mechanism is used to determine the most confident hypotheses, which are used on the system level to train multimodal models that link object identity to invariant hypothesis properties. The top-down data flow originates at the system level, where the trained multimodal models are used to obtain space- and feature-based attentional modulation signals, providing biases for the competitive selection process at the hypothesis level. This results in object-specific hypothesis facilitation/suppression in certain image regions which we show to be applicable to different object detection mechanisms. In order to demonstrate the benefits of this approach, we apply the system to the detection of cars in a variety of challenging traffic videos. Evaluating our approach on a publicly available dataset containing approximately 3,500 annotated video images from more than 1 h of driving, we can show strong increases in performance and generalization when compared to object detection in isolation. Furthermore, we compare our results to a late hypothesis rejection approach, showing that early coupling of top-down and bottom-up information is a favorable approach especially when processing resources are constrained

    Spatial and Temporal Dynamics of Attentional Guidance during Inefficient Visual Search

    Get PDF
    Spotting a prey or a predator is crucial in the natural environment and relies on the ability to extract quickly pertinent visual information. The experimental counterpart of this behavior is visual search (VS) where subjects have to identify a target amongst several distractors. In difficult VS tasks, it has been found that the reaction time (RT) is influenced by salience factors, such as the target-distractor similarity, and this finding is usually regarded as evidence for a guidance of attention by preattentive mechanisms. However, the use of RT measurements, a parameter which depends on multiple factors, allows only very indirect inferences about the underlying attentional mechanisms. The purpose of the present study was to determine the influence of salience factors on attentional guidance during VS, by measuring directly attentional allocation. We studied attention allocation by using a dual covert VS task in subjects who had 1) to detect a target amongst different items and 2) to report letters briefly flashed inside those items at different delays. As predicted, we showed that parallel processes guide attention towards the most relevant item by virtue of both goal-directed and stimulus-driven factors, and we demonstrated that this attentional selection is a prerequisite for target detection. In addition, we show that when the target is characterized by two features (conjunction VS), the goal-directed effects of both features are initially combined into a unique salience value, but at a later stage, grouping phenomena interact with the salience computation, and lead to the selection of a whole group of items. These results, in line with Guided Search Theory, show that efficient and rapid preattentive processes guide attention towards the most salient item, allowing to reduce the number of attentional shifts needed to find the target

    An Image Statistics–Based Model for Fixation Prediction

    Get PDF
    The problem of predicting where people look at, or equivalently salient region detection, has been related to the statistics of several types of low-level image features. Among these features, contrast and edge information seem to have the highest correlation with the fixation locations. The contrast distribution of natural images can be adequately characterized using a two-parameter Weibull distribution. This distribution catches the structure of local contrast and edge frequency in a highly meaningful way. We exploit these observations and investigate whether the parameters of the Weibull distribution constitute a simple model for predicting where people fixate when viewing natural images. Using a set of images with associated eye movements, we assess the joint distribution of the Weibull parameters at fixated and non-fixated regions. Then, we build a simple classifier based on the log-likelihood ratio between these two joint distributions. Our results show that as few as two values per image region are already enough to achieve a performance comparable with the state-of-the-art in bottom-up saliency prediction

    Chinese characters reveal impacts of prior experience on very early stages of perception

    Get PDF
    Visual perception is strongly determined by accumulated experience with the world, which has been shown for shape, color, and position perception, in the field of visuomotor learning, and in neural computation. In addition, visual perception is tuned to statistics of natural scenes. Such prior experience is modulated by neuronal top-down control the temporal properties of which had been subject to recent studies. Here, we deal with these temporal properties and address the question how early in time accumulated past experience can modulate visual perception

    Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention

    Get PDF
    Visual attention is thought to be driven by the interplay between low-level visual features and task dependent information content of local image regions, as well as by spatial viewing biases. Though dependent on experimental paradigms and model assumptions, this idea has given rise to varying claims that either bottom-up or top-down mechanisms dominate visual attention. To contribute toward a resolution of this discussion, here we quantify the influence of these factors and their relative importance in a set of classification tasks. Our stimuli consist of individual image patches (bubbles). For each bubble we derive three measures: a measure of salience based on low-level stimulus features, a measure of salience based on the task dependent information content derived from our subjects' classification responses and a measure of salience based on spatial viewing biases. Furthermore, we measure the empirical salience of each bubble based on our subjects' measured eye gazes thus characterizing the overt visual attention each bubble receives. A multivariate linear model relates the three salience measures to overt visual attention. It reveals that all three salience measures contribute significantly. The effect of spatial viewing biases is highest and rather constant in different tasks. The contribution of task dependent information is a close runner-up. Specifically, in a standardized task of judging facial expressions it scores highly. The contribution of low-level features is, on average, somewhat lower. However, in a prototypical search task, without an available template, it makes a strong contribution on par with the two other measures. Finally, the contributions of the three factors are only slightly redundant, and the semi-partial correlation coefficients are only slightly lower than the coefficients for full correlations. These data provide evidence that all three measures make significant and independent contributions and that none can be neglected in a model of human overt visual attention

    “I have no clue what I drunk last night” Using Smartphone technology to compare in-vivo and retrospective self-reports of alcohol consumption.

    Get PDF
    This research compared real-time measurements of alcohol consumption with retrospective accounts of alcohol consumption to examine possible discrepancies between, and contextual influences on, the different accounts.Building on previous investigations, a specifically designed Smartphone technology was utilized to measure alcohol consumption and contextual influences in de facto real-time. Real-time data (a total of 10,560 data points relating to type and number of drinks and current social / environmental context) were compared with daily and weekly retrospective accounts of alcohol consumption.Participants reported consuming more alcoholic drinks during real-time assessment than retrospectively. For daily accounts a higher number of drinks consumed in real-time was related to a higher discrepancy between real-time and retrospective accounts. This effect was found across all drink types but was not shaped by social and environmental contexts. Higher in-vivo alcohol consumption appeared to be related to a higher discrepancy in retrospectively reported weekly consumption for alcohol beverage types other than wine. When including contextual factors into the statistical models, being with two or more friends (as opposed to being alone) decreased the discrepancy between real-time and retrospective reports, whilst being in the pub (relative to being at home) was associated with greater discrepancies.Overall, retrospective accounts may underestimate the amount of actual, real-time alcohol consumed. Increased consumption may also exacerbate differences between real-time and retrospective accounts. Nonetheless, this is not a global effect as environmental and social contexts interact with the type of alcohol consumed and the time frame given for reporting (weekly vs. daily retrospective). A degree of caution therefore appears warranted with regards to the use of retrospective self-report methods of recording alcohol consumption. Whilst real-time sampling is unlikely to be completely error free, it may be better able to account for social and environmental influences on self-reported consumption

    Learned Value Magnifies Salience-Based Attentional Capture

    Get PDF
    Visual attention is captured by physically salient stimuli (termed salience-based attentional capture), and by otherwise task-irrelevant stimuli that contain goal-related features (termed contingent attentional capture). Recently, we reported that physically nonsalient stimuli associated with value through reward learning also capture attention involuntarily (Anderson, Laurent, & Yantis, PNAS, 2011). Although it is known that physical salience and goal-relatedness both influence attentional priority, it is unknown whether or how attentional capture by a salient stimulus is modulated by its associated value. Here we show that a physically salient, task-irrelevant distractor previously associated with a large reward slows visual search more than an equally salient distractor previously associated with a smaller reward. This magnification of salience-based attentional capture by learned value extinguishes over several hundred trials. These findings reveal a broad influence of learned value on involuntary attentional capture
    corecore